Automatic Thesaurus Extraction for Thai Text Retrieval Enhancement

نویسندگان

  • Chaiwat Ketsuwan
  • Nattakan Pengphon
  • Asanee Kawtrakul
چکیده

Thesaurus is one of the most important components for information retrieval (IR) systems. A thesaurus provides a precise and controlled vocabulary that serves to coordinate document indexing and retrieval then it improves the retrieval effectiveness. However the major problem with the manual thesaurus is a laborintensive task and therefore also expensive to build and hard to update in timely manner. Consequently, this paper proposes one approach to construct Thai thesaurus automatically, called a Thai association thesaurus, based on the statistical technique and natural language processing technique.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Thai Ontology Construction and Maintenance System

Ontology is an essential resource to enhance the performance of Information Processing system such as information integration, document classification in taxonomies, including information retrieval and data cleaning in database system. This paper proposes three methodologies for Automatic Thai Ontology Construction and Maintenance from technical corpus, dictionary and thesaurus. For corpus base...

متن کامل

Corpus-based terminology extraction applied to information access

This paper presents an application of corpus-based terminology extraction in interactive information retrieval. In this approach, the terminology obtained in an automatic extraction procedure is used, without any manual revision, to provide retrieval indexes and a “browsing by phrases” facility for document accessing in an interactive retrieval search interface. We argue that the combination of...

متن کامل

Terminology Retrieval: Towards a Synergy between Thesaurus and Free Text Searching

Multilingual Information Retrieval usually forces a choice between free text indexing or indexing by means of multilingual thesaurus. However, since they share the same objectives, synergy between both approaches is possible. This paper shows a retrieval framework that make use of terminological information in free-text indexing. The Automatic Terminology Extraction task, which is used for thes...

متن کامل

An Enhancement of Thai Text Retrieval Efficiency by Automatic Backward Transliteration

Loan words, which are borrowed from foreign languages, are used in many languages such as Japanese, Chinese, Korean and Thai. They have effects on Thai Text Retrieval (TTR) system leading to inaccurate terms weight for indexing and text clustering. Therefore, there is a need to create automatic backward transliteration that can solve this problem. In this paper, we propose a hybrid model approa...

متن کامل

ارائه روشی برای استخراج کلمات کلیدی و وزن‌دهی کلمات برای بهبود طبقه‌بندی متون فارسی

Due to ever-increasing information expansion and existing huge amount of unstructured documents, usage of keywords plays a very important role in information retrieval. Because of a manually-extraction of keywords faces various challenges, their automated extraction seems inevitable. In this research, it has been tried to use a thesaurus, (a structured word-net) to automatically extract them. A...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000